DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs

117

TABLE 4.4

Results of the comparison on the ImageNet dataset with DCP-NAS

of the distance calculation method used to constrain the gradient of

binary NAS in the tangent direction, i.e., Eq. 4.31. We use the small

size of the model, that is, DCP-NAS-S, to evaluate the searched

architecture.

Method

Accuracy(%)

Memory (MBits)

Search Cost

Top1

Top5

Cosine similarity

62.5

83.9

4.2

2.9

L1-norm

62.7

84.3

4.3

2.9

F-norm

63.0

84.5

4.2

2.9

much smaller performance gap between real-valued NAS with a lower search cost by a clear

margin. We conduct ablative experiments for different architecture discrepancy calculation

methods to further clarify the tangent propagation. As shown in Table 4.4, F-norm applied

in Eq. 4.31 achieves the best performance, while the cosine similarity and the L1-norm are

not as effective as the F-norm.